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Abstract 

Background: Tlie TGF-|3 signaling pathway is a fundamental pathway in the living cell, which plays a key role in 
nnany central cellular processes. The connplex and sonnetinnes contradicting nnechanisnns by which TGF-|3 yields 
phenotypic effects are not yet completely understood. In this study we investigated and compared the transcriptional 
response profile of TGF-|3l stimulation in different cell types. For this purpose, extensive experiments are performed 
and time-course microarray data are generated in human and mouse parenchymal liver cells, human mesenchymal 
stromal cells and mouse hematopoietic progenitor cells at different time points. We applied a panel of bioinformatics 
methods on our data to uncover common patterns in the dynamic gene expression response in respective cells. 

Results: Our analysis revealed a quite variable and multifaceted transcriptional response profile of TGF-pi stimulation, 
which goes far beyond the well-characterized classical TGF-|3l signaling pathway. Nonetheless, we could identify 
several commonly affected processes and signaling pathways across cell types and species. In addition our analysis 
suggested an important role of the transcription factor EGRl, which appeared to have a conserved influence across 
cell-types and species. Validation via an independent dataset on A549 lung adenocarcinoma cells largely confirmed our 
findings. Network analysis suggested explanations, howTGF-(3l stimulation could lead to the observed effects. 

Conclusions: The analysis of dynamical transcriptional response to TGF-|3 treatment experiments in different human 
and murine cell systems revealed commonly affected biological processes and pathways, which could be linked to 
TGF-|3l via network analysis. This helps to gain insights about TGF-|3 pathway activities in these cell systems and its 
conserved interactions between the species and tissue types. 
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Background 

The transforming growth factor-betal (TGF-|3l) signaling 
pathway is a fundamental pathway in the living cell, which 
plays a role in many central cellular processes. The TGF-|3 
superfamily contains over 30 different proteins, such as 
BMPs, Activins, Inhibins, and the TGF-|3l isoforms [1-3]. 
The pathway contributes to regulation of various cellular 
processes, such as apoptosis, cell differentiation, cell growth 
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as well as tumor suppression and immune regulation 
processes [4]. 

There are three TGF-|3 isoforms (TGF-|3l, TGF-|32, 
TGF-|33) which have different physiological and patho- 
logical effects on epithelial, endothelial, lymphatic, mye- 
loid and mesenchymal tissues [5]. The TGF-|3 pathway is 
one of the most studied pathways [6-10]. However, the 
complex and sometimes contradicting mechanisms by 
which TGF-|3 yields phenotypic effects is not yet com- 
pletely understood [5]. The classical TGF-|3l pathway is 
already well established since several years [7]. However, 
the identification of alternative signaling pathways that 
contain different receptors and Smad proteins has in- 
creased the overall complexity of the TGF-|3l signaling 
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pathway [11]. Additional file 1: Figure SI shows a simpli- 
fied cartoon sketch comprising mainly Smads cascades 
in the TGF-pl signaling pathway. 

In this study we investigated and compared down- 
stream effects of TGF-|3l stimulation on the dynamical 
response of gene expression in mouse and human in dif- 
ferent cell and tissue types. Two types of mouse 
hematopoietic progenitor cells were used: multipotent 
progenitor (MPP) and dendritic cell (DC) committed 
progenitors, referred to as common dendritic progenitor 
(CDP) cells. CDF differentiate from MPP and give rise 
to two types of DC: plasmacytoid DC (pDC) and con- 
ventional DC (cDC). MPP and CDP were obtained from 
bone marrow by in vitro culture with a specific cytokine 
cocktail and FACS sorting [12,13]. Furthermore, we 
employed human mesenchymal stromal cells (MSC), 
which differentiate into osteocytes, chondrocytes or adi- 
pocytes [14-16]. Finally, primary murine hepatocytes 
(HPC) and immortalized human hepatocytes (human 
HPC, HepG2) cells were used. We have taken these dif- 
ferent cell types for three reasons: (i) All these cells are 
highly responsive to TGF-|3. (ii) The different cell types 
reflect different degrees of differentiation, (iii) The differ- 
ent cells show a variable response to TGF-p. While in 
hepatocytes TGF-p induces apoptosis, multipotent pro- 
genitors initiate a differentiation programme in response 
to TGF-p. 

Very little and vague information is known about the 
detailed influence of TGF-pl in these different cell sys- 
tems. For example, TGF-pl is known to be necessary for 
MSC proliferation. It is essential for chondrogenic differ- 
entiation. On the other hand, TGF-pl participates in in- 
hibition of adipogenic and osteogenic differentiation. 
Furthermore, there are evidences, that TGF-pl contrib- 
utes to supporting myogenic differentiation of MSC 
[17-19]. There are also evidences that the TGF-p path- 
way play a role in the induction of cellular senescence 
in MSC [20]. Although TGF-pl triggers primary early 
responses (e.g. Smad activation) and EMT in human 
HPC (HepG2) cells, cell cycle arrest and apoptosis are 
generally not promoted by TGF-pl [21,22]. Further- 
more, TGF-pl is known to be crucial for development 
of Langerhans cells, the cutaneous contingent of migra- 
tory dendritic cells, both in vivo and in vitro and it evi- 
dently contributes in accelerating their differentiation 
and directing their subsets specification toward cDCs 
[12,23-25]. 

We used a panel of bioinformatics methods, ranging 
from statistical testing over functional and promoter se- 
quence analysis to clustering for pattern discovery in our 
gene expression time series data. Only one gene, the 
SKI-like oncogene {Skil), was commonly found to be 
differentially expressed (DE) in all cell types. Skil is a 
component of the SMAD -pathway, which regulates cell 



growth and differentiation. Moreover, SmadV that blocks 
TGF-p receptor activity seems to play a major common 
role, because it was identified as DE in most cell types. 
Despite of the differences on the level of individual 
genes we observed a conserved effect of TGF-pl stimu- 
lation on a number of biological processes and pathways. 
Moreover, we could identify a few overrepresented tran- 
scription factor binding sites, which were commonly 
found in several cell types. Specifically EGRl seems to 
have major relevance for the transcriptional stimulation 
response in mouse and human. 

By analysis of an independent dataset on human A549 
lung adenocarcinoma cells (CRL) from GEO (access No. 
GSE17708) [26] we were able to reproduce a highly sig- 
nificant proportion of the commonly identified biological 
processes, pathways and transcriptional factors in our 
datasets. Network analysis suggests explanations, how 
TGF-pl stimulation could lead to the observed effects. 

Results and discussion 

Time series transcriptome measurements 

All cell types were treated with TGF-p in three bio- 
logical replicates. TGF-p treatment concentrations were 
optimized in each cell type to show a maximal effect. 
Extracted RNA samples were hybridized to microarrays 
(Affymetrix Gene 1.0 ST) for genome-wide transcrip- 
tome analysis. Mouse progenitor cells and HepG2 cells 
were measured at 6 successive time points, mouse pri- 
mary HPC cells at 5, and human MSCs at 4 different 
time points. Additional file 2: Table SI gives an over- 
view of our experiments and the measured time-points, 
the "Methods" section gives details about cell cultures, 
stimulation, RNA-isolation and array hybridization in 
our experiments. 

Differential gene expression 

Transcriptional response is highly tissue specific on gene level 

We employed the "betr" method [27] to quantify the 
probability of differential expression of genes in whole 
time-courses (see Methods). Using this approach we 
were able to assess differential gene expression for each 
gene in each cell type in a comparable manner. We con- 
sidered a gene to have differential time-course expres- 
sion (DE), if it had a probability of >99% and was at least 
two-fold up- or down-regulated at one time point mini- 
mum (Additional file 1: Figures S2 a & b. Additional 
file 2: Tables S2 & S8). 

The strongest stimulatory effect of TGF-pl was ob- 
served in CDP cells (614 genes). Eight out of these 
genes in CDP are already known to play a role in the 
TGF-p pathway {TgfbS, SmadV, Thbsl, Tgfbrl, Smurfl, 
Smads, Smad6, Tgfbr2), In mouse HPC a significantly 
lower number of DE genes were found compared to 
other cell types. 
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We conducted set comparisons of DE genes across cell 
types. It is worth mentioning in this context that com- 
parisons between mouse and human genes were done 
on the basis of homologous genes (see Methods). Not 
surprisingly, the found overlap was particularly high 
among mouse hematopoietic progenitor cells (MPP and 
CDP). These were 173 genes, which equals a harmonic 
mean of above 41% of DE genes in both cell types 
(Additional file 1: Figure S2 a). Only two of these genes, 
namely Smad7 and Tgfbrl are known to play a role in 
the TGF-p pathway. Three genes {Lox, Pmepal, Skil) 
are found to be DE in all mouse cells (CDP, MPP and 
HPC). Pmepal (Prostate Transmembrane Protein) is 
known to interact with Smad and suppress the TGF-p 
pathway [28,29]. Only the protein-coding gene Skil 
(Ski-like-oncogene) that encodes a protein in the SMAD- 
pathway [30,31] was found to have a DE time-course in all 
cell types. In addition, the gene Smad7 was commonly 
found in all cell types except mouse HPC cells. 18 genes 
including RORl, ClOorflO, SMAD7, FSTL3, GADD45B, 
JUNB, ZFP36, 0LFM2, SPTLC3, IDl, LMCDl, SLC38A3, 
GXYLT2, SKIL, HESl, RASGEFIB, CITED2 and PDGFA 
were DE in all human cells (MSC, HepG2). The heatmaps 
in Additional file 1: Figure S3 visualize patterns of tem- 
poral behavior for particular groups of genes. Here again 
we see similarity in gene expressions between mouse 
dendritic cells. 

These findings on one hand stress the similarity of the 
transcriptional response in MPP and CDP, which is not 
very surprising given the fact that these cells were both 
derived from bone marrow. On the other hand they 
highlight that TGF-pl treatment affects by far not only 
genes within the canonical TGF-pl pathway, but leads 
to a large number of diverse secondary downstream ef- 
fects, which are only partially overlapping across differ- 
ent cell types. In other words there is a high tissue 
specificity of the transcriptional TGF-p stimulation re- 
sponse on the level of individual genes. 

TGF-pi pathway genes react time-dependant and tissue-specific 

We had a closer look at genes, which are known to play 
a role in the TGF-pl pathway, such as Bmp(s), Smad(s) 
and Id(s). In Figure 1 the log2 fold changes of 17 genes 
involved in the TGF-pl pathway, which are DE in at 
least one cell type, are depicted. It can be noticed that 
almost all genes show time-dependant transcriptional ef- 
fects. These effects are distinct between early and later 
time points, with moderate activities until 4 h and 
mostly higher activities at late times. It can also be 
noticed that cells of similar origin are more alike. For ex- 
ample, Bmp2, Bmp4, Bmp6, Cdkn2b and Comp are dys- 
regulated (i.e. significantly differ from 0 level according 
to "betr") only in human and not in mouse tissues. Fsl is 
similar to these genes, but also shows activity in mouse 



HPC. Idl in human cells is up-regulated at earlier time 
points and a down- regulated after 4 h. Inhba shows ac- 
tivity only in MSC cells where its expression after 1 hour 
constantly increases. Smad3, Smad6 and Smad? reveal 
similar time courses in mouse MPP and CDP cells and 
in human MSCs. Smad3 is increasingly down- regulated 
over time and the other two genes are always up- 
regulated. Smurfl is always over-expressed and shows a 
curve that is opposite to Smad3, Smad6 and SmadV. 
Tgfb3 is over-expressed at later time points in MPP and 
CDP cells and shows almost no activity in the other cell 
types. Thbsl is highly active in all cell types. However, 
while it is underrepresented in MPP and CDP, it shows el- 
evated expression in mouse and human HPC. Tgfbrl and 
Tgfbr2 behave similar, in particular in mouse progenitor 
cells, where Tgfbr2 is less up-regulated than Tgfbrl. 

Time point specific analysis confirms highly tissue specific 
expression changes on gene expression level 

In order to cross-validate our previous analysis, which con- 
siders time series as a whole, we conducted also a time-point 
specific analysis of differential gene expression using lin- 
ear models for microarray data (Limma). For this purpose 
we compared the gene expression at 4 hours after stimula- 
tion to the initial expression at time point 0. The time 
period of 4 hours was chosen because at least short-time 
relevant effects are expected in all cell types after this period. 

In the context of this time point specific analysis of 
transcriptional effects we considered a gene to be differen- 
tially expressed (DE) in a given cell type, if FDRBH < 0.01 
and the absolute fold changes was (logFC) > 1. The 
overlap analysis of DE genes at 4 h agrees with the 
time-course analysis. There are no or very few genes in 
common between the different cell types except in the case 
of mouse dendritic cells (Additional file 1: Figure S5 A). 
Moreover, the direction of regulation (up or down) differs 
between cell types (Additional file 2: Table S9). 

The heatmap in Additional file 1: Figure S4 depicts the 
log fold changes of all genes, which are DE in at least 
one cell type. The plot indicates two gene sets, which 
clearly show a similar behavior in mouse MPP and CDP 
cell types. The first set contains 36 genes that are over- 
expressed. The other set (42 genes) is under-expressed. 
Interestingly, the 36 genes being up-regulated in MPP 
and CDP cells are not regulated by TGF-pl in other cell 
types. Although not DE genes in every cell type, the 
genes SmadV, Pmepal (beside the gene Skil) seem to be 
up-regulated in all the cells. The rest of the genes are 
regulated in a rather cell-type specific manner. 

Cluster analysis reveals functionally similar gene groups 
in different cell types 

We conducted a time series cluster analyses in order to find 
groups of DE genes showing similar expression changes 
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Figure 1 Log2 fold-changes of 17 genes, which are DE in at least one cell type and are known to play a role in the TGF-P pathway 
(according to KEGG annotation). 



over time (observed as within cell-type temporal behavior 
shown in Additional file 1: Figures S3, between cell-types 
similarities shown in Additional file 1: Figure S4). The 
cluster analyses yielded 12 clusters in MPP and mouse 
HPC, 20 in CDF and 11 in human MSG (Additional file 2: 
Table S3). Genes contained in individual clusters can be 
found in Additional file 2: Table S14. Figure 2 depicts the 
mean curves for each of these clusters in each cell type. We 
investigated the functional similarity of genes across differ- 
ent clusters. For this purpose the R-Package "GOSemSim" 
[32] utilizing the semantic similarity measure proposed by 
Wang et al. [33] was employed. Semantic similarities are a 
means to compare GO annotations of gene pairs in a quan- 
titative manner, for example on the basis of the information 
content of GO terms. We refer the reader to [34] for an 
excellent overview. 

A heatmap depicting these GO semantic similarities 
suggested a high functional similarity of genes in several 
clusters from different cell types (Additional file 1: 
Figure S6 Additional file 2: Tables S15, SI 6). In particu- 
lar cluster B (MPP), and cluster B (CDP) are highly simi- 
lar to each other (semantic similarity > 0.7). Time-course 
log2 fold changes of the corresponding genes are shown 
in (Figure 3 top). As can be noticed the clusters are of 
different size, but have several genes in common (13 
genes). Functional analysis revealed that genes in these 
clusters are enriched for cell adhesion molecules (CAMs), 
valine, leucine and isoleucine biosynthesis, Pantothenate 



and CoA biosynthesis and regulation of cellular extrava- 
sation. Enrichment analysis was conducted here via the 
R-package GOstats [35], which employs a hypergeo- 
metric test taking into account the dependency structure 
among GO terms. 

The second group of functionally similar clusters (Figure 3, 
bottom) contains cluster K (human HPC) and cluster B 
(MSCs). Genes in these clusters play (among others) a 
role in TGF-^ and Notch signaling pathways (Additional 
file 2: Tables S15, S16). 

Taken together our cluster analyses showed that des- 
pite evident differences on the level of individual genes, 
functionally similar clusters of genes can be identified 
across cell types. 

Enrichment analysis reveals commonly affected biological 
processes, pathways and transcription factors in all cell types 

Motivated by our previous findings we asked, whether there 
were common functional patterns across all cell types. For 
this purpose we scanned GO terms and KEGG pathways 
for significant association with differential time course gene 
expression in each cell type (Additional file 2: Tables S4, 
SIO, Sll). 

Our analysis brought up 6 KEGG pathways and 11 GO 
terms, which were significantly associated to all cell types 
(FDR < 5%, Figure 4). The 6 KEGG pathways associated 
to all cell types were: Metabolic pathways, Glutathione 
metabolism, Lysosome, Purine metabolism. Peroxisome 
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Figure 2 Cluster mean-curves of log2 fold changes for the different cell types. For mouse HPC no clusters could be identified and hence all 
DE genes treated as one group. 



and PPAR signaling pathway. The 11 GO terms associated 
to all cells were: oxidation-reduction process, innate im- 
mune response, positive regulation of transcription from 
RNA polymerase II promoter, negative regulation of apop- 
totic process, angiogenesis, lipid metabolic process, positive 
regulation of cell proliferation, positive regulation of cell 
migration, proteolysis, positive regulation of transcription 
DNA-dependent and response to drug. The role of TGF-|3 
in apoptosis, cell proliferation as well as immune response 
is well known. Moreover, an effect of TGF-|3 perturbation 
on PPAR signaling has been described in skin fibroblasts 
[36]. In [37] the authors describe TGF-|3 mediated oxida- 
tive stress and decreased glutathione concentration in fi- 
brosis models. Finally, there is evidence that TGF-p has an 
effect on angiogensis and cell migration [38]. Hence, our 
findings largely fit to the current biological knowledge 
about TGF-|3. 

Conserved role of EGR1/2 transcription factors 

We analyzed DE genes with respect to overrepresented 
sequence motifs in their promoter regions with the 
XXmotif tool [39]. Significant motifs were then com- 
pared to known position weight matrices ( TRANS FAC) 
of transcription factors (TFs) via STAMP [40]. 

The analysis in each cell type predicted between 11 and 
21 regulating TFBS in the time-course analysis (Additional 
file 2: Tables S6, S17, SI 8), except for mouse HPC, where 
no overrepresented TFBS could be detected. This may be 
attributed to the small number of 16 DE genes in this cell 



type. Overlaps were particularly high within mouse MPP 
and GDP and within human cells. 

FOXPl, KROX, TEF, POU6F1, FOX and PITX binding 
sites were commonly identified in mouse MPP and GDP. 
KROX, HFH4 and PAX4 were found in all human cells. 
FOX, FOXPl, KROX and TEF were found to be them- 
selves representatives of DE genes. Figure 5 shows a net- 
work representation of all eight TFBS together with the 
set of DE genes containing respective binding sites. The 
plot reveals a relative clear difference between mouse 
and human cells with the exception of the KROX TFBS, 
which appears in all four cell types. KROX represents 
EGRl and EGR2. 

Network analysis suggests possible signal transduction 
pathways in mouse and human 

In order to better understand, how TGF-p may influence 
the commonly identified transcription factor, biological 
processes and the PPAR-pathway we conducted a network 
analysis. Using protein-protein interaction information 
from the BioGRID database [41] we reconstructed a 
mouse and a human specific network depicting dys- 
regulated paths from TGF-p to SKIL, SMAD7, EGRl as 
well as genes involved into glutathione metabolism, purine 
metabolism, PPAR signaling, oxidation-reduction process, 
innate immune response, negative regulation of apoptotic 
process, angiogenesis, positive regulation of cell prolifera- 
tion and positive regulation of cell migration (Figures 6 
and 7; see further details in Methods part). 
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Figure 3 Log2 fold changes of two groups of functionally similar clusters detected in different cell types. Genes appearing in more tlian 
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Figure 4 Clustered heatmaps of (a) the 6 common KEGG pathways and (b) 11 GO terms in different cell types. The color code indicates 
the degree of association (-log(FDR)) of a KEGG pathway and GO term to each cell type, respectively. 
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Figure 7 Murine protein-protein interaction networlc connecting Tgbfrl, Smad7, Slcil, Egri, Pparg with genes involved into commonly 

identified biological processes and pathways. The dashed green line indicates the putative transcriptional regulation of Smad7 and Tgfbrl by 

transcription factor Egrl. The darker the red color of a node the higher the average probability for differential time course expression. 
I J 



Our network analysis suggests pathways, by which TGF-p 
stimulation is possibly propagated via protein-protein inter- 
actions to our commonly identified biological processes. 
Due to the organism specificity of interactome information 
these pathways show certain differences: Far less protein- 
protein interactions are known in mouse than in human. In 
human, for example, negative regulation of apoptosis might 
be mediated via SMAD3 and SOX9 [42]. In contrast, the 
GO and network analysis in mouse suggests a direct 
role of TGFBRl. 

Enrichment of biological processes, pathways and tfbss is 
reproducible on independent dataset 

To validate the central finding from our data, namely 
the existence of commonly affected biological processes, 
pathways and transcription factors in all cell types, we 
downloaded an independent dataset comprising gene ex- 
pression data measured at 9 time points (0, 0.5, 1, 2, 4, 
8, 16, 24, 72 h) after TGF-p stimulation in human A549 
lung adenocarcinoma cell-lines (CRL, GSE17708). The 
dataset was analyzed in the same manner as described 
for our data before. High fractions of the 11 GO terms 
and 6 KEGG pathways commonly identified in all of 
our cell types were also found in GSE 17708 (Figure 8, 
Additional file 2: Table SIO). Out of the KEGG pathways 
and GO terms associated to all of our human cells 70% 
and 74%, respectively could be reproduced on the inde- 
pendent dataset (Figure 8, Additional file 2: Table Sll). 
Notably, 11 (61%) out of the 18 genes which exhibiting 



differential time courses in both our human MSG and 
HPC cells were found also to have differential time- 
courses in GSE17708 cells, these were RORh SMAD7, 
FSTL3, GADD4SB, JUNB, ZFP36, IDl, LMCDl, GXYLT2, 
SKIL and HESl. This corresponding fraction is signifi- 
cantly larger than expected by chance (p < lE-9, hyper- 
geometric test). 

The KROX TFBS (corresponding to TFs EGRl and 
EGR2), which was enriched in all of our cell types, was 
also found in GSE17708. Moreover, the other two TFBS 
that we identified in our human cells (HFH4, PAX4) 
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GO Terms 



o 
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All 
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Figure 8 Percentages of KEGG pathways (left) and GO terms 
(right) enriched commonly in our cell types that could be 
reproducibly identified in GSE17708. The numbers in tip of tlie 
bars are tlie p-value for tine null-liypotliesis to see tlie corresponding 
overlap just by chance (hyper-geometric test). 
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were also enriched in the A549 lung cancer cell line 
(Additional file 2: Tables S17). 

Taken together this analysis reveals a high reproduci- 
bility of our commonly identified biological processes, 
pathways as well as TFs. 

Conclusions 

We have conducted an in-depth comparison of the dy- 
namical TGF-|3l response profile on gene expression 
level across several cell types. Despite of a generally high 
degree of cell type specificity, there appears to be a com- 
mon functional response, which is conserved across 
cell types and species (i.e. mouse and human). Our ana- 
lysis suggests a common effect of TGF-|3l stimulation 
on apoptosis, cell proliferation, immune response, angio- 
genesis, cell migration, PPAR signaling, oxidative stress 
as well as purine and glutathione metabolism. Network 
analysis gives hints to possible pathways, by which these 
effects could be mediated. 

On the level of individual genes the 5'/<7-like oncogene 
and Smad7 were differentially expressed in most {SmadT) 
or all (5'/<7-like oncogene) cell types and thus appear to 
play a major role. Smad7 is involved into the canonical 
TGF-p pathway [43]. It is a general antagonist of the 
TGF-p family (for review see [44]). The 5'/<7-like onco- 
gene is a direct target gene of Smad2, which regulates 
its transcription [45]. It plays a role in cell growth and 
differentiation. 

Notably, a high fraction of the biological processes, 
pathways and TFBS that we identified to be enriched in 
all our cell types was found also in an independent data- 
set from a lung cancer cell line. This strengthens the 
confidence into our results. 

In summary our findings indicate that despite a high 
variability of transcriptional response across cell types 
and organisms there appears to be a set of commonly af- 
fected processes and pathways. In addition, the TFBS 
analysis suggested a major role of the transcription fac- 
tor EGRl in the TGF-p response in human and mouse. 
Indeed the induction of EGRl via TGF-p stimulation 
has been already reported earlier [46] and thus fits to 
the existing knowledge about TGF-pl - induced tran- 
scriptional response in other cell systems. 

Previous studies of TGF-p stimulation were mainly 
limited to one specific cell type (e.g. fibroblasts). In this 
paper we went beyond this point and conducted experi- 
ments in different cell types under as much as possible 
comparable conditions. In consequence we were able to 
compare transcriptional responses across cell types and 
organisms, which revealed common patterns. The iden- 
tification of common and specific signal transduction 
pathways that are affected by TGF-p in human and mice 
will allow us to define potential therapeutic targets and 
will further enable us to characterize gene expression 



patterns and complex regulatory networks. In addition, 
future work using our and other transcriptome data can, 
for example, address the identification of TGF-p 
dependent mesenchymal or epithelial gene signatures or 
the definition of cell specific cancer signatures. 

Methods 

Ethics statement 

Animal experiments required for obtaining murine 
MPP, GDP and HPC were approved by local authorities 
(Landesamt fiir Natur, Umwelt und Verbraucherschutz 
Nordrhein-Westfalen - LANUV NRW) in compliance 
with the German animal protection law. Experiments 
were performed at the Institute for Animal Research of 
the RWTH Aachen University Hospital, under projects 
10534A4 and 10718A4, entitied "Untersuchungen zur 
Hamatopoese aus adulten Blutstammzellen". Animal main- 
tenance, handling, and anesthesia were performed accord- 
ing to the Federation for Laboratory Animal Science 
Associations FELASA recommendations. 

All samples of human MSG cells were used after patient s 
written consent using guidelines approved by the Ethic 
Committee of the RWTH University of Aachen (Permit 
number: EK128/09). 

Mouse hematopoietic progenitor cells (MPP & CDP) 
Cell culture 

MPP and CDP were obtained from mouse bone marrow, 
using in vitro culture with a specific cytokine cocktail 
and FACS sorting [12,13]. 

TGF-pi stimulation 

After sorting, MPP and CDP were treated with 10 ng/ 
mL recombinant human TGF-pl (R&D Systems, Minne- 
apolis, USA) for 2, 4, 8, 12 and 24 h as described in [12] 
or left untreated. Cells were lysed in 350 \A TRI-Reagent 
and stored at -80°C. 

RNA isolation 

RNA was isolated using the MagMAX-96 for Microar- 
rays kit (Life Technologies, Darmstadt, Germany) ac- 
cording to manufacturer s protocol. 

Primary mouse hepatocytes and human HepG2 cells (HPC) 

Hepatocytes (HPC) represent the most prominent cell 
population in the liver. Primary HPC are sensitive to 
TGF-pl, and express the corresponding type I (ALK5), 
type II (TpRII), and type III (betaglycan) receptors. TGF- 
pl promotes cell cycle arrest and apoptosis of primary 
HPC. In addition, in vitro TGF-pl provokes epithelial- to- 
mesenchymal transition (EMT)-like processes in this hep- 
atic cell subpopulation, which most likely do not occur 
in vivo [47]. 
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HepG2 cells originate from a 15 year old child with 
primary hepatoblastoma [48]. They do secrete the major 
plasma proteins but do not express the hepatitis B virus 
surface antigen (HBsAg) [49]. 

Cell culture 

Primary murine HPC were isolated from male C57BL/6 
mice according to the collagenase method of Seglen 
[50]. Cells were plated in collagen coated 6-well dishes 
at a density of 1.2 x 10^ cells using HepatoZYME-SFM 
(Gibco, Life Technology, Darmstadt, Germany). Four 
hours after seeding the medium was renewed and cells 
were grown for a further 24 hrs culture period. 

HepG2 (DSMZ: DSM ACC180) were cultured in RPMI 
(PAA, Pasching, Austria) containing 10% fetal calf serum 
(PAA), 1 X Penicillin/ Streptomycin (Lonza, Cologne, 
Germany). Medium was renewed every second day. For 
the experiment, cells were passaged and plated in 6-well 
dishes using accutase (PAA) at a density of 4 x 10^ cells. 
One day before the experiment, cells were washed with 
PBS (Ix), medium changed to HepatoZYME-SFM (Gibco) 
and cultured for further 24 hrs. 

TGF-pi stimulation 

One hour before the experiment, the medium was ex- 
changed and cells stimulated with 1 ng/mL recombin- 
ant human TGF-|3l (R&D Systems, Minneapolis, USA) 
for indicated time intervals (HepG2: 0 min, 20 min, 
1 h, 2 h, 4 h, 24 h; murine HPC: 0 min, 1 h, 2 h, 4 h). 
The cells were harvested using Qiazol for cell lysis 
(Qiagen, Hilden, Germany), directly frozen and stored 
at -80°C. 

RNA isolation 

RNA was isolated using the RNeasy Kit system (Qiagen), 
performing a DNAse digestion according to the manu- 
facturer s protocol. 

Human Mesenchymal Stromal Cells (MSG) 

Mesenchymal stromal cells (MSC) are found in all sup- 
portive tissue as in fat tissue, bone marrow and cord. 
MSC are characterized by their plastic adherence and 
their differentiation potential towards adipogenic, osteo- 
genic and chondrogenic lineages. All MSC express the 
surface markers CD29, CD73, CD90 and CD 105 and 
they lack the expression of CD14, CD31, CD34 and 
CD45 [14,51]. 

Isolation and expansion 

MSCs were isolated from mononuclear cells (MNCs) by 
plastic adherence as described before [16,52,53]. In brief, 
bone fragments from the caput femoris of patients 
undergoing femoral head prosthesis were flushed with 
phosphate- buffered saline (PBS) and washed twice with 



PBS. MNC were then resuspended in culture medium 
and seeded into tissue culture flasks. Cells were cultured 
at 37°C in a humidified atmosphere with 5% CO2. The 
first medium exchange was performed after 48 h to re- 
move nonadherent cells. Thereafter, media changes were 
performed twice per week and MSCs were passaged 
when reaching 80-90% of confluence. 

TGF-pi stimulation 

MSC from three different donors were used in an early 
passage (p3-5) for stimulation with TGF-bl. 1x10^ MSC 
were seeded into 6-well culture plates. When the cells 
were attached after 24 h 1 ng/mL recombinant TGF-pl 
(R&D Systems) was added to the culture media at differ- 
ent time points. The cells were harvested at the same 
time point with Qiazol (Qiagen) and directly frozen and 
stored at -80°C. 

RNA isolation 

RNA was isolated via phenol/chloroform extraction 
using the miRNeasy Kit (Qiagen), performing a DNAse 
digestion. 

Genechip hybridization 

Human samples were assayed using Affymetrix Gene- 
Chip" type "Human Gene 1.0 ST Array" with 34,760 
probe-sets and mouse samples were assayed in Affyme- 
trix GeneChip type "Mouse Gene 1.0 ST Array" with 
32,321 probe-sets. Hybridization, wash and staining were 
done according to manufacturer s recommended stand- 
ard techniques. 

Normalization and preprocessing 

Raw probe intensities were normalized and summarized to 
expression levels using the FARMS algorithm which uti- 
lizes a factor analysis approach [54] . A rigorous quality as- 
sessment confirmed a fairly good quality of the chips with 
exception of mouse HPC chips where Initial chip quality 
assessment revealed a strong batch effect and one bad chip 
(replicate no. 1 at time 1 hour). The bad chip was excluded 
and batch adjustment was performed to alleviate that effect 
on those chips via the "ComBat" method [55]. 

Aff)^metrix probe IDs were mapped to Entrez gene IDs 
using the Bioconductor annotation packages "mogen- 
elOsttranscriptcluster.db" in mouse chips and "hugen- 
elOsttranscriptcluster.db" in human chips [56,57]. 

Differential gene expression 
Time point specific analysis 

Differential gene expression analyses via "Limma" Linear 
Models for Microarray Data [58] using empirical Bayes 
method [59] was performed by comparing samples at 
each time point after TGF-p stimulation to the unstimu- 
lated cells at time point 0. Statistical dependencies of 
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samples between time points and replicates were consid- 
ered via a factorial design matrix in "Limma" using a 
"time" and a "replicate" factor, and contrasts are consid- 
ered for interaction effects. Corrections for multiple test- 
ing was done using the Benjamini & Hochbergs method 
[60]. Significant differentially expressed genes are con- 
sidered those with FDRBH < 0.01 and absolute log2 fold 
change value (logFC) > 1. 

Analysis of whole time courses 

The small number of replicates in our experiments limits 
the power of statistical testing procedures for assessing 
differential gene expression at individual time points. Fur- 
thermore, the number of measured time points is not the 
same for each cell type, which complicates any further 
meta-analysis. Therefore, we employed the "betr" method 
to analyze whole time series at once [27]. The algorithm 
of this method uses a random-effects model together with 
the empirical Bayes method to estimate probabilities for 
differential expression of whole time courses. Genes were 
considered to be significant at a probability cutoff of> 
99% for the whole time-course analysis and absolute log2 
fold change (logFC) > 1. Since "betr" requires the same 
number of replicates per time point and one chip in 
mouse HPC had to be omitted due to low quality (see 
Normalization & Preprocessing) unfortunately in this par- 
ticular cell line we had to exclude time point 1 h com- 
pletely from the time-course analysis. 

Cluster analysis 

Clustering of gene expression time series was done via the 
MFDA method proposed in [61]. The method assumes 
gene expression time series within each given cluster to 
follow a mean curve plus some Gaussian noise. It decides 
cluster allocation via a Gaussian Mixture Model utilizing 
an EM algorithm for parameter estimation and decides 
the optimal number of clusters via the Bayesian Informa- 
tion Criterion (BIC). 

It is worth mentioning that we applied MFDA not on 
raw gene expression data here, but on log fold-changes 
relative to time point 0. The reason was that we wanted 
to group genes not on the basis of their absolute expres- 
sion values, but on the basis of similar response to the 
stimulus. 

KEGG and go analysis 

Analyses of pathways in KEGG [43] and biological pro- 
cesses in Gene Ontology project (GO) [62] were per- 
formed as follows: The -log(p-value) of all genes in the 
individual time point analysis and -log(l - probabilities) 
of all genes in the whole time-course analysis, respect- 
ively, were taken as a ranking score for each transcript. 
Gene sets of KEGG pathways and GO terms were then 
tested for their association with these ranking scores via 



a univariate logistic regression based test [63,64]. Result- 
ing p- values of KEGG pathways and GO terms were 
adjusted according to Benjamini & Yekutielis false dis- 
covery rate control under dependency [65], and signifi- 
cant KEGG pathways and GO terms reported at a 
FDRby cutoff value of 5%. 

Transcription factor binding sites analysis 

Analysis of transcription factors binding sites (TFBSs) 
was performed using the de novo sequence motif detec- 
tion method XXmotifs [39]. Identified sequence motifs 
were then aligned to known TRANS FAC TFBS via 
STAMP [66] and the top match is considered. The 
XXmotif method uses BLAST [67] all-against-all com- 
parisons to mask regions of local homology in order to 
avoid false positives. The method then performs an 
enrichment analysis after transforming the found pat- 
terns to position weight matrices (PWMs). The STAMP 
method utilizes a global or un-gapped local alignment to 
detect DNA motifs similarities to defined PWMs. Fur- 
thermore, it considers familial binding profiles, thus im- 
proving transcription factors (TF) classification accuracy. 
TFBSs analysis was done using these methods in each 
cell type for those genes, which according to the time- 
course analysis showed a probability of > 0.99% for dif- 
ferential expression. Promoter sequences of the genes 
under consideration (2Kbp upstream of transcription 
start site) were obtained from the Ensembl database [68] 
via "biomaRt" [69,70]. Only the top matching motif for 
each TRANSFAC TFBS was considered and significant 
TFBSs were reported at an E- Value threshold of le-3. 

Mapping of TFBS to individual transcription factors was 
performed via manual inspection of TRANSFAC PWMs. 
We obtained all proteins which had been used to con- 
struct each of the PWMs. With the help of the commer- 
cial software GeneGo Metacore® [71] we then mapped 
protein names to Entrez gene IDs. As a consequence we 
found for the TFBS FOXPl the differentially expressed 
genes Foxp2 and FoxPl (MPP, CDP mouse cells). For the 
TFBS FOX the human gene FAU was identified (human 
HPC). For KROX we found Egrl/EGRl (mouse HPC, hu- 
man MSG) and EGR2 (human MSG). For TEF we identi- 
fied Klf3/KLF3 (MPPs, CDPs, and human HPC), TRIM37 
and USP7 (human HPC, MSG). 

Identification of homologous genes 

Human homologs of mouse genes were identified via the 
KEGG Sequence Similarity Data Base (SSDB), which 
contains local alignments of amino acid sequences for 
protein coding genes from different species. We here 
considered two genes to be homologs, if the alignment 
E-value was below le-30 (bit-score > 112). In case of 
more than one homologous gene, all are considered. 
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Network analysis 

Information about protein-protein interactions was col- 
lected separately for human and mouse from the BioGRID 
database version 3.2.109 [41]. Correspondingly, a network 
comprising 16,011 nodes and 140,471 physical interactions 
was constructed for human. For mouse the network con- 
sisted of 6,233 nodes and 16,100 physical interactions. 
Nodes in these networks were weighted by the average 
probability (mean over all cell types from the same organ- 
ism) for differential time course expression according to 
the betr model (see above). A "distance" for each edge 
was then calculated as 2 minus the sum of its incident 
nodes' weights. Hence: the smaller the distance the higher 
the weight of its incident nodes. We used Dijkstras algo- 
rithm to search for minimum distance (i.e. maximum node 
weight) path connecting TGFBl with each of SKIL, 
SMAD7, EGRl, PPARG and all genes annotated to gluta- 
thione metabolism, purine metabolism, oxidation-reduction 
process, innate immune response, negative regulation of 
apoptotic process, angiogenesis, positive regulation of cell 
proliferation and positive regulation of cell migration. For 
each of the last mentioned terms we only kept those genes 
as representatives which showed the minimum distance to 
TGFBl. If there were several paths of the same minimum 
distance, all of them were considered. 

In the network for mouse Tgfbl was not identified and 
hence we started our analysis with Tgfbrl instead. 

Availability of supporting data 

The datasets supporting the results of this article are in- 
cluded within the article and its supplemental material files. 

In addition, our microarray datasets for the different 
cell types analyzed here are deposited in the Gene 
Expression Omnibus database [72] under accession num- 
bers GSE45942 (murine HPC), GSE45945 (human HFC), 
GSE46019 (human MSG) and GSE46109 (mouse MPP 
and CDF). 

Additional files 
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